Filled Pause
Research Center

Filled Pause
Research Center

Filled Pause
Research Center

Investigating 'um' and 'uh' and other hesitation phenomena

Investigating 'um' and 'uh' and other hesitation phenomena

Investigating 'um' and 'uh' and other hesitation phenomena

August 12th, 2012

First Stage of Transcription of CCHP Recordings has begun

I met with the research support staff recently to go over the procedures for transcribing all the recordings. There is roughly 9 hours of recordings to be transcribed in several stages. The first stage, which is probably the most arduous, is to transcribe all the words in each recording delimiting them minimally into utterances. In addition, the staff will transcribe all overt hesitation phenomena including filled pauses, false starts, repair sequences, and repeats (see (Taxonomy)[http://filledpause.com/taxonomy] for some details). Other hesitation phenomena such as silent pauses and lengthenings will be detected and marked using some automated procedures. Each recording will be transcribed by two staff members independently, and then I will resolve any differences between the two transcriptions.

I anticipate that this procedure will take several weeks and probably months to finish. However, when completed, then we can move on to the second stage of transcription and annotate the transcripts with pause and word interval durations. Ideally, this will be completed before the end of the project year (March, 2013).

As the transcription files are completed and put into a distributable format (XML), they will be uploaded to the web archive and will be available together with the original recordings. Watch the web site for details about updates and availability.

Crosslinguistic Corpus of Hesitation Phenomena (CCHP) Logo